ATOM Documentation

← Back to App

V2 Graph Architecture: PostgreSQL-Backed GraphRAG

Executive Summary

This document outlines the architecture for the **V2 GraphRAG Engine**, which replaces previous in-memory (NetworkX) and disk-based embedded prototypes. The V2 engine leverages **PostgreSQL** for graph storage and traversal, and **LanceDB** for high-dimensional vector search, providing a stateless, scalable, and ACID-compliant foundation for the ATOM Cloud platform.

1. The Core Problem

Previous iterations of GraphRAG (V1) were **stateful and in-memory**.

  • **Scaling Limit**: Loading an entire graph into Python RAM was viable only for small tenants.
  • **Resource Competition**: Resident memory growth on ATOM Cloud Elastic Nodes risked OOM crashes as workspace density increased.
  • **Consistency**: In-memory graphs lacked the transactional integrity required for enterprise-grade knowledge management.

2. The Solution: Relational Graph (V2)

The V2 engine treats the knowledge graph as a **stateless relational structure**.

Components

  1. **Graph Store**: **PostgreSQL**
  • **Models**: GraphNode and GraphEdge.
  • **Traversal**: Uses **SQL Recursive Common Table Expressions (CTEs)** to perform deep graph traversal directly in the database.
  • **Persistence**: Managed ATOM Cloud PostgreSQL instances.
  1. **Vector Store**: **LanceDB**
  • Stores text embeddings for semantic retrieval.
  • **Sync**: Episodes and extracted entities are mirrored to LanceDB for hybrid search capabilities (keywords + semantics + relationships).
  1. **Extraction Pipeline**: **Background Workers**
  • **Workflow**: Document Ingest → LLM Entity Extraction → Relation Mapping → PostgreSQL/LanceDB Upsert.
  • **Orchestration**: Triggered via webhooks or manual ingestion events.

3. Deployment Architecture

Storage Layout

ComponentStorage BackendATOM Cloud ImplementationPerformance Detail
**Nodes & Edges**PostgreSQLcore.models.GraphNode/EdgeIndex-optimized recursive lookups.
**Semantic Vectors**LanceDBCloud Object Storage (S3/R2)Sub-millisecond vector similarity.
**Processing**WorkersElastic Compute NodesAsynchronous extraction via TaskQueue.

Traversal Strategy (Recursive CTE)

Unlike traditional graph databases that require specialized query languages (Cypher/Gremlin), the V2 engine uses standard SQL Recursive CTEs. This allows the application to remain **stateless**:

  • No "sticky" sessions required.
  • Any API node can perform N-hop traversals.
  • Results are streamed directly from the database to the reasoning engine.

4. Multi-Tenancy & Isolation

Isolation is enforced at the database layer:

  • **Tenant Partitioning**: All graph tables include a tenant_id (Workspace ID) column with mandatory indexing.
  • **Query Lifecycle**: Every GraphRAG query is automatically scoped to the active tenant_id context via the ServiceFactory and GraphRAGEngine.
  • **Encryption**: Data at rest is encrypted using workspace-specific keys where requested.

5. Ingestion Workflow

Extraction is handled asynchronously by ATOM background workers to ensure the main API remains responsive:

  1. **Event**: A document is uploaded or a webhook is received.
  2. **Task**: A graph_ingest task is queued for processing.
  3. **Extraction**: The worker uses the **LLM Extraction Service** to identify entities (users, projects, roles) and relationships.
  4. **Verification**: Entities are matched against core schemas (canonical resolution).
  5. **Commit**: The worker performs a transactional upsert into PostgreSQL and syncs the vector embeddings to LanceDB.

6. Implementation Reference

The core logic resides in backend-saas/core/graphrag_engine.py:

class GraphRAGEngine:
    """
    PostgreSQL-backed GraphRAG Engine.
    Uses SQL Recursive CTEs for traversal (Stateless).
    """
    def ingest_document(self, workspace_id: str, doc_id: str, text: str):
        # 1. LLM Extraction (Background Task)
        # 2. Canonical matching
        # 3. SQL Upsert
        pass

    def query(self, workspace_id: str, query_text: str, mode: str = "local"):
        # 1. Semantic search in LanceDB
        # 2. Graph traversal in PostgreSQL via Recursive CTEs
        # 3. Hybrid result synthesis
        pass